Automatic Keyphrase Extraction by Bridging Vocabulary Gap

نویسندگان

  • Zhiyuan Liu
  • Xinxiong Chen
  • Yabin Zheng
  • Maosong Sun
چکیده

Keyphrase extraction aims to select a set of terms from a document as a short summary of the document. Most methods extract keyphrases according to their statistical properties in the given document. Appropriate keyphrases, however, are not always statistically significant or even do not appear in the given document. This makes a large vocabulary gap between a document and its keyphrases. In this paper, we consider that a document and its keyphrases both describe the same object but are written in two different languages. By regarding keyphrase extraction as a problem of translating from the language of documents to the language of keyphrases, we use word alignment models in statistical machine translation to learn translation probabilities between the words in documents and the words in keyphrases. According to the translation model, we suggest keyphrases given a new document. The suggested keyphrases are not necessarily statistically frequent in the document, which indicates that our method is more flexible and reliable. Experiments on news articles demonstrate that our method outperforms existing unsupervised methods on precision, recall and F-measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topical Word Trigger Model for Keyphrase Extraction

Keyphrase extraction aims to find representative phrases for a document. Keyphrases are expected to cover main themes of a document. Meanwhile, keyphrases do not necessarily occur frequently in the document, which is known as the vocabulary gap between the words in a document and its keyphrases. In this paper, we propose Topical Word Trigger Model (TWTM) for keyphrase extraction. TWTM assumes t...

متن کامل

A Refined Methodology for Automatic Keyphrase Assignment to Digital Documents

AbstrAct: Keyphrases precisely express the primary topics and themes of documents and are valuable for cataloging and classification. Manually assigning keyphrases to existing documents is a tedious task; therefore, automatic keyphrase generation has been extensively used to classify digital documents. Existing automatic keyphrase generation algorithms are limited in assigning semantically rele...

متن کامل

Keyphrase Annotation with Graph Co-Ranking

Keyphrase annotation is the task of identifying textual units that represent the main content of a document. Keyphrase annotation is either carried out by extracting the most important phrases from a document, keyphrase extraction, or by assigning entries from a controlled domain-specific vocabulary, keyphrase assignment. Assignment methods are generally more reliable. They provide better-forme...

متن کامل

Semantically Enhanced Automatic Keyphrase Indexing

The goal of this PhD thesis is to elaborate methods for automatic keyphrase indexing with a controlled vocabulary. Keyphrases are single words or multi-word lexemes that concisely and accurately describe the subject or an aspect of the subject discussed in a document. They are widely used in large document collections such as digital libraries and document repositories. They help organize mater...

متن کامل

State of the Art of Automatic Keyphrase Extraction Methods (État de l'art des méthodes d'extraction automatique de termes-clés) [in French]

State of the Art of Automatic Keyphrase Extraction Methods This article presents the state of the art of the automatic keyphrase extraction methods. The aim of the automatic keyphrase extraction task is to extract the most representative terms of a document. Automatic keyphrase extraction methods can be divided into two categories : supervised methods and unsupervised methods. For supervised me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011